Return candidates from all data sources on id search#6184
Conversation
|
Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry. |
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- The
Candidatestype alias is defined asdict[Info.Identifier, AnyMatch]but then used asCandidates[AlbumMatch]/Candidates[TrackMatch], which isn’t a parametrizable generic; consider either makingCandidatesaTypeAliaswith two type parameters (key/value) or annotating the dicts directly to avoid confusing/misleading typing. - Moving the
album_matchedevent emission intoAlbumMatch.__post_init__makes constructingAlbumMatchobjects have side effects everywhere; consider using a factory/helper (or an explicit method) to emit the event so that simple instantiation stays side-effect-free and easier to reason about. - In
_add_candidate, the duplicate check mixesinfo.album_idandinfo.identifierwhile thecandidatesdict is keyed byidentifier; simplifying this to only useidentifierfor both the truthiness check and the lookup would make the intent clearer and avoid relying onalbum_idbeing non-empty.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `Candidates` type alias is defined as `dict[Info.Identifier, AnyMatch]` but then used as `Candidates[AlbumMatch]`/`Candidates[TrackMatch]`, which isn’t a parametrizable generic; consider either making `Candidates` a `TypeAlias` with two type parameters (key/value) or annotating the dicts directly to avoid confusing/misleading typing.
- Moving the `album_matched` event emission into `AlbumMatch.__post_init__` makes constructing `AlbumMatch` objects have side effects everywhere; consider using a factory/helper (or an explicit method) to emit the event so that simple instantiation stays side-effect-free and easier to reason about.
- In `_add_candidate`, the duplicate check mixes `info.album_id` and `info.identifier` while the `candidates` dict is keyed by `identifier`; simplifying this to only use `identifier` for both the truthiness check and the lookup would make the intent clearer and avoid relying on `album_id` being non-empty.
## Individual Comments
### Comment 1
<location> `beets/autotag/match.py:203-204` </location>
<code_context>
return
# Prevent duplicates.
- if info.album_id and info.album_id in results:
+ if info.album_id and info.identifier in results:
log.debug("Duplicate.")
return
</code_context>
<issue_to_address>
**issue (bug_risk):** Duplicate-prevention now checks album_id but keys are identifier tuples, so it will never filter duplicates.
Since results is keyed by info.identifier (data_source, id), this condition should be based solely on identifier. The album_id guard is now misleading and may skip intended deduping. Consider removing the album_id check and using only `if info.identifier in results:` (or otherwise aligning the condition with how keys are stored).
</issue_to_address>
### Comment 2
<location> `beets/metadata_plugins.py:58-62` </location>
<code_context>
- A single ID can yield just a single track, so we return the first match.
- """
+@notify_info_yielded("trackinfo_received")
+def tracks_for_ids(_id: str) -> Iterable[TrackInfo]:
+ """Return matching albums from all metadata sources for the given ID."""
for plugin in find_metadata_source_plugins():
- if info := plugin.track_for_id(_id):
</code_context>
<issue_to_address>
**nitpick (typo):** Docstring for tracks_for_ids mentions albums instead of tracks.
The description looks copied from `albums_for_ids` and should say "tracks" instead of "albums" to match the function’s purpose and avoid confusing metadata source plugin implementors.
```suggestion
@notify_info_yielded("trackinfo_received")
def tracks_for_ids(_id: str) -> Iterable[TrackInfo]:
"""Return matching tracks from all metadata sources for the given ID."""
for plugin in find_metadata_source_plugins():
yield from plugin.tracks_for_ids([_id])
```
</issue_to_address>
### Comment 3
<location> `beets/autotag/match.py:284-294` </location>
<code_context>
if candidates and not config["import"]["timid"]:
# If we have a very good MBID match, return immediately.
# Otherwise, this match will compete against metadata-based
# matches.
if rec == Recommendation.strong:
log.debug("ID match.")
return (
cur_artist,
cur_album,
Proposal(list(candidates.values()), rec),
)
</code_context>
<issue_to_address>
**suggestion (code-quality):** Merge nested if conditions ([`merge-nested-ifs`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/merge-nested-ifs))
```suggestion
if candidates and not config["import"]["timid"] and rec == Recommendation.strong:
log.debug("ID match.")
return (
cur_artist,
cur_album,
Proposal(list(candidates.values()), rec),
)
```
<br/><details><summary>Explanation</summary>Too much nesting can make code difficult to understand, and this is especially
true in Python, where there are no brackets to help out with the delineation of
different nesting levels.
Reading deeply nested code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two `if` conditions can be combined using
`and` is an easy win.
</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull request overview
This PR refactors the autotag matching system to support returning candidates from multiple metadata sources when searching by ID, and fixes an issue where candidates with duplicate IDs from different sources would overwrite each other.
- Changes metadata plugin API from
album_for_id/track_for_id(returning single results) toalbums_for_ids/tracks_for_ids(yielding multiple results from all sources) - Uses composite
Info.identifier(tuple of data_source and id) as candidate dictionary keys to prevent cross-source ID collisions - Converts
AlbumMatchandTrackMatchfrom NamedTuples to dataclasses and movesalbum_matchedevent emission toAlbumMatch.__post_init__to deduplicate event firing
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| test/test_autotag.py | Removes assignment tests (moved to new test file) and unused import |
| test/autotag/test_match.py | New test file containing moved assignment tests plus new tests for multi-source ID matching scenarios |
| beets/metadata_plugins.py | Replaces single-result album_for_id/track_for_id functions with multi-result albums_for_ids/tracks_for_ids generators; updates base class method signatures to properly filter None values |
| beets/autotag/match.py | Simplifies match_by_id, updates candidate dictionary to use composite identifiers, removes manual album_matched event calls (now in dataclass), removes unused plugins import |
| beets/autotag/hooks.py | Adds Info.identifier property, converts Match classes to dataclasses with __post_init__ for event emission |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
95fecc5 to
c8c62b3
Compare
a4109de to
7282ede
Compare
7282ede to
e89d97d
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6184 +/- ##
==========================================
+ Coverage 69.42% 69.52% +0.09%
==========================================
Files 141 141
Lines 18452 18475 +23
Branches 3020 3020
==========================================
+ Hits 12811 12844 +33
+ Misses 5004 4997 -7
+ Partials 637 634 -3
🚀 New features to boost your workflow:
|
e89d97d to
079749c
Compare
079749c to
b73d48a
Compare
b73d48a to
835cf4f
Compare
1cf45db to
50fd8cb
Compare
50fd8cb to
396e906
Compare
These functions now accept both an ID and data_source parameter, enabling plugins like mbsync and missing to retrieve metadata from the correct source. Update mbsync and missing plugins to use the restored functions with explicit data_source parameters. Add data_source validation to prevent lookups when the source is not specified. Add get_metadata_source helper function to retrieve plugins by their data_source name, cached for performance.
396e906 to
35361a6
Compare
Closes #6178 (multiple metadata source results per ID) and #6181 (duplicate/overwrite of candidates).
album_for_id/track_for_idwithalbums_for_ids/tracks_for_idsinmetadata_pluginsthat yield candidates from all metadata sourcesInfo.identifier((data_source, id)) as candidate keys to avoid cross-source ID collisions.test/autotag/test_match.py) for assignment logic and multi-source ID matchingmatch_by_idalbum_matchedevent emission by moving it toAlbumMatch.__post_init__(and convertAlbumMatch/TrackMatchto dataclasses)I am refactoring a couple of other things in
beets.autotag.matchmodule because this thing is a hot mess.